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Abstract. A new method for analyzing high-dimensional categorical data, 
Linear Latent Structure (LLS) analysis, is presented. LLS models belong to 
the family of latent structure models, which are mixture distribution models 
constrained to satisfy the local independence assumption. LLS analysis explic- 
itly considers a family of mixed distributions as a linear space and LLS models 
are obtained by imposing linear constraints on the mixing distribution. 

LLS models are identifiable under modest conditions and are consistently 
estimable. A remarkable feature of LLS analysis is the existence of a high- 
performance numerical algorithm, which reduces parameter estimation to a 
sequence of linear algebra problems. Preliminary simulation experiments with 
a prototype of the algorithm demonstrated a good quality of restoration of 
model parameters. 



1. Introduction 

We present a new statistical method, Linear Latent Structure (LLS) analysis, 
which belongs to a domain of latent structure analysis. In presentation of the 
method and in investigation of its properties we follo w a new understand ing of 
latent structure models as mixture distribution models l)Bartholomewll2002j) . 



The latent structure analysis considers a number of categorical variables mea- 
sured on each individual in a sample, and it is aimed to discover properties of a pop- 
ulation as well as properties of individuals composing a population. The main as- 
sumption of the latent structure analysis is the local independence assumption. Be- 
ing formulated in a more contemporary way, this means that the observed joint dis- 
tribution of categorical random variables is a mixture of independent distributions 
(see section |21 for more detail). The mixing distribution is considered as a distribu- 
tion of latent variable (s), which is thought of as containing hidden information re- 
garding the phenomenon under consideration. The goal of the latent structure anal- 



ysis is to discover properties of latent variables; different approaches to this prob- 
lem are described in Lazarsfeld (1950b ah:lLazarsfeld and Henrvl dl968l): iGoodmanl 


119781): 


Laneeheine and RostJ dl988^l 


; Cloge 


il99fil):lHemerJl 


1996) 


Bartholomew and Knott 


Marcoulides and Moustakil 


2002). 



The various branches of latent structure analysis differ in additional assumptions 
regarding latent variables — or, equivalently, regarding mixing distribution. Latent 
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class analysis assumes that the mixing distribution is concentrated in a finite num- 
ber of points (called "latent classes"). Latent trait analysis (LTA) tries to represent 
mixed distributions as a function of a latent trait, which in most cases is assumed to 
be one-dim ensional parameter. In the 1990s, multi dimensional latent traits were in- 
vestigated l|Hoiitink and Molenaarlll997llReckaselll997|) . However, the application 
of multidimensional LTA is not as broad as one-dimensional LTA, since estimating 
parameters in multidimensional case requires additional assumptions. 

The novelty of our approach lies in consideration of the space of mixed distri- 
butions as a linear space. This allows us to employ geometric intuition and clearly 
formulate the main additional assumption of LLS that the mixed distribution is 
supported by a linear subspace of the space of independent distributions. 

Models arising from this approach are identifiable under modest conditions and 
are consistently estimable. Further, there exists a high-performance numerical al- 
gorithm, which reduces estimation of model parameters to a sequence of linear 
algebra problems. Preliminary simulation experiments with a prototype of the 
algorithm (presented in section demonstrated a good quality of restoration of 
model parameters. 

The word "linear" in the name of the method reflects at least three aspects of 
our method: first, the model is obtained by imposing linear constraints on the 
mixing distribution; second, the algorithm for model construction uses methods 
of linear algebra; and third, the most interesting ideas of our approach arise from 
consideration of a space of distributions as a linear space. 

Historically, the predecessor of LLS analysis was Grade o f Memb ership (GoM) 
analysis, which was introduced in Woodbury and dive (1974); see also Manton et al. 
l|l994j) for detailed exposition and additional references. Our work on LLS analysis 
originated from attempts to find conditions for consistency of GoM estimators. The 
development eventually lead to a new class of models, which differ from GoM mod- 
els in a way how the model is formulated, methods of model estimation, meaning 
of estimators and their interpretation. We present here this new class of models 
under the name "linear latent structure analysis." 

The present article concentrates on the exposition of main ideas of the LLS 
analysis and investigation of its statistical properties. Section^ describes the basics 
of LLS analysis. A new procedure for parameter estimation is explained in Sectional 
Sections 01 and El answer the questions concerning identifiability of LLS models and 
consistency of the estimators. Section gives results of preliminary experiments 
with a prototype of the algorithm implemented by the authors. The article is 
concluded by section[?l where we discuss some interesting properties of LLS analysis 
and compare it with other kinds of latent structure analysis. 



2. LLS MODELS 

The input to LLS analysis is outcome of J categorical measurements, each made 
on N individuals. 

Mathematically, we consider J categorical random variables X\, . . . , Xj. The set 
of possible values of random variable Xj is {1, .. .Lj}. This structure is described 
by an integer vector L = (L\, . . . , Lj). Two numbers, which are used frequently in 
the rest of the article, are associated with that structure: |L| = L\ + . . . + Lj and 
/., •...•/..,. 



LINEAR LATENT STRUCTURE ANALYSIS 



3 



To denote response patterns, we use integer vectors £ = {t\, .. . ,£j). The j th 
component, £j, represents the value of random variable Xj (thus, £j G 
Note that there are \L*\ different vectors £. 

The joint distribution of random variables X±, . . . , Xj is given by \L*\ elementary 
probabilities 



(1) p e = P(Xi = £ x and ... and Xj = £j) 

In general, no restrictions other than p£ > for all £ and Y] e pi = 1 are imposed 
on the family of elementary probabilities; thus, one needs \L*\ — 1 parameters to 
describe a joint distribution of X\, . . . , Xj. 

Note that probabilities pg are directly estimable from the observed data: fre- 
quencies ft = ^ (where N is the total number of individuals in the sample and Ne 
is the number of individuals who responded with response pattern £) are consistent 
and efficient estimators for pi. 

Among all joint distributions one can distinguish independent distributions, i.e. 
distributions, in which random variables X\, . . . ,Xj are mutually independent. 
This means that for every set of indices j% , . . . , j p and for every response pattern £ 
the relation 



(2) P(X h = l h and . . . and X jp = £ jp ) = P(X n = £ h ) ■ ... ■ P(X jp - £ jp ) 

holds. Equation allows us to describe an independent distribution using fewer 
parameters. Namely, let flji = P(Xj — I). Then for every response pattern £, 



(3) pi =n^- 

Thus, every independent distribution can be identified with a point f3 = (flji)ji G 
Rl L l . Not every point (3 £ M' L ' corresponds to a probability distribution; to describe 
a distribution, (3 must satisfy the conditions: 



(4) 



Ya=i Pji = 1 for ever y j 

Pji > for every j and / 



Conditions Q define a convex (\L\ — J)-dimensional polyhedron in W L >, which 
we denote § L . 

Now we are ready to formulate the first assumption of the LLS analysis: 

(Gl) The observed distribution is a mixture of independent distributions, i.e. 

there exist a probabilistic measure up, supported by S L , such that for every 

response pattern £ 

(5) pe=J (n/=i^) w(^) 

Remark 2.1. Here and later, we use measures instead of probability density func- 
tions or cumulative distribution functions, as this allows us to avoid discussion of 
possible singularities and specifics of the space on which probability distribution 
is defined. This is just a matter of convenience; the above integral can be written 
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as I \J\j=i Pjijj dF(j3), where F(f3) is the cumulative distribution function of the 
mixing distribution. 

Assumption (Gl) is a cornerstone of latent structure analysis (often, it is called 
the local independence assumption). There are a lot of excellent books and articles 
devoted to latent structure analysis; we refer to Lazarsfcld Lazarsfcld and 

ill QfiSyCoodrnatil ill 97^:lLa,nrerieme a.nri Rostl ill 98^:ICloggl ill 9 fill: lHeinenl l |l flflfih : 
iBartholomew and Knottl l)1999|) ; lMarcoulides and Moustakil l|2002j) for discussion of 
the meaning and applicability of this assumption. 

Mathematically, the assumption (Gl) alone does not imply much. For almost 
each distribution {j>g)g there exist infinitely many mixing distributions /ip that 
produce the same observed distribution. Thus, one needs more assumptions to make 
the model identifiable. In latent structure analysis, such assumptions are usually 
formulated in the form of restrictions on the support of the mixing distribution \xp. 

The specific assumption of LLS analysis is: 

(G2) The mixing distribution fip is supported by a linear subspace Q o/R' L '. 

For comparison, the corresponding assumption of latent class analysis is that up 
is supported by a finite number of points (latent classes). 

When dimensionality of Q is sufficiently smaller than |L|, LLS model is almost 
surely identifiable and consistently estimable from data. It is discussed in subse- 
quent sections. 

Informally, the existence of low-dimensional support of me asure ua means that 
all measurements reflect the same underlying hidden entity. In lKovtun et al.l l|2005|) 
we have shown that the existence of low-dimensional support is equivalent to the 
existence of a if-dimensional random vector G such that regressions of all indicator 
random vectors Yj on G are linear. (Yj = (Yji> ■ ■ ■ > Yjl = 1> if Xj = h 

otherwise, Yji = 0.) 

Distributions satisfying the condition (G2) may be expected when random vari- 
ables X\, . . . , Xj represent responses to survey or exam questions. Here, ques- 
tions are intentionally chosen to discover a single (potentially multidimensional) 
quantity — like "quality of life" or "mathematical knowledge" . This is the natural 
domain of applications of latent structure analysis in general, and LLS analysis in 
particular. 

We say that a distribution is generated by a .fT-dimensional LLS model, if it can 
be represented as a mixed distribution satisfying (G2) with dim(Q) = K. 



3. Estimation of LLS model 

"To define a LLS model" means to define mixing distribution [ip, which, in turn, 
means specifying the supporting subspace Q and the distribution over it. 

The supporting subspace may be consistently estimated from the observed data, 
i.e. the estimated subspace converges to the true one when sample size tends to 
infinity. The identifiability conditions are rather straightforward: if the dimen- 
sionality of supporting subspace is of order of ( ^ 2 ~ J —vaaxLjj or smaller, the 

supporting subspace is almost surely identifiable ( theorem 14. 4fl . 

LLS analysis uses nonparametric approach to description of the mixing distribu- 
tion. Thus, the knowledge about the mixing distribution is expressed in the form 
of a family of conditional moments of order up to J. Using these moments, the 
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mixing distribution may be approximated as an empirical distribution. The exam- 
ples given in section demonstrate the goodness of such approximation; see also 
section for discussion of properties of this approxi mation. 

The technical details of what follows are given in iKovtun et alJ l)2005(l . Here we 
formulate the most important facts and pay more attention to the most significant 
statistical properties such as identifiability of the model and consistency of the 
estimates. 

Let K be the dimensionality of Q and let A 1 = (Xj l )ji,...,X K — (Xfi)ji be 
a basis of Q. Let g = (gi, . . . ,gx) be coordinates of points of Q written in the 
basis A 1 , . . . , \ K . This means that for points contained in Q coordinates (3 and g 
connected as: 

(6) = Ef=i 4 • 9k 

or, in matrix form, f3 — Ag, where A is \L\ x K matrix, A = (A* { )*j. 

Recall that the support of mixing measure fip is also restricted to E> L (a polyhe- 
dron defined by conditions JU), i.e. ftp is supported by intersection of Q and S L . 
We consider only bases in which all A fc belong to § L . In this case, coordinates g of 
points belonging to the support of \ip satisfy g\ H h gx = 1; thus, g are homo- 
geneous coordinates of points from Q n S L . It is possible to exclude any coordinate 
gi, . . . ,gK and use the remaining K — 1 coordinates to denote points of Q DS L ; 
however, we prefer to use the redundant set of coordinates to preserve symmetry 
of equations. 

Let n g be the measure up written in coordinates g. This means that for every 
function cf> defined on Q one has J 4>((3) fip(d/3) = J <p(Ag) fj, g (dg). In particular, 

(7) Pt = J (UU = / (ll/=i Ef=i ^ji ■ 9k) V 9 (dg) 

Every probabilistic measure on n-dimensional euclidean space may be considered 
as a distribution law of an n-dimensional random vector. Let B = {Bji)ji be a 
random vector corresponding to measure \x$ and let G = {Gk)k be a random vector 
corresponding to measure /i g . In fact, B and G are the same random vector, but 
written in different coordinates. 

It might be shown that Xi, . . . , Xj and G (or B) have a joint distribution; thus, 
one can speak about conditional probabilities and conditional expectations. 

Some moments of order J of B coincide with elementary probabilities Q • Name- 
ly, due to J5j), 

(8) M t {B) = J (n/ =1 P iti ) = Pi 

The above equation may be extended to moments of order lower than J. To 
proceed, we need to extend ^-notation. From now, we allow O's in some positions of 
vector I. Such O's mean that we "do not care" about values of corresponding random 
variables. A vector I with some components equal to may be also thought of as 
a set of all response patterns, which have arbitrary values on "do not care" places 
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and coincide with £ on all other places. Then pg will be a marginal probability, and 
the corresponding moments of B will be: 



(9) M t (B)= / ( II Pjti)l*0W)=Pt 

The set of moments M(,(B) for all £ (including £ with zeros) is all what is directly 
estimable from the observation. 

Another set of values of interest is the set of conditional moments of order v = 
(vi, . . . , vk) of random vector G: 



(10) g\ = E(G? ■ . . . ■ G v « | X = £) 

Here E denotes the expectation, and X — £ is an abbreviation for conjunction of 
conditions Xj = £j for all j such that £j ^ 0. Note that the values g v t depend on 
the choice of the basis A 1 , . . . , \ K . 

These conditional moments express the knowledge regarding individuals that 
can be obtained from the measurements. In particular, conditional expectations 
(equation ljll|l below) may be considered as estimators of individual coordinates in 
state space (see also section [7J|. 

Among all conditional moments, moments of order 1, or conditional expectations, 
have special importance, and we use special notation for them: 



gn ^ g f^ = E( Gl \X = £) 

(11) 

geK ^ g ^-^ = E(G K \X = £) 

The above values satisfy the following equation l|Kovtun et all 120051 section 
6.2): 



(12) M t {B) ■ ■ gf +--- + Xf r gf) = M t ,{B) ■ g}, 

Here: (a) v k denotes a vector v with k th component increased by 1 (for example, 
if v = (1, 3, 2, 1), then v 3 = (1, 3, 3, 1)), (b) response pattern £ must have at j th 
position, (c) £' denotes the response pattern obtained from £ by replacing at j th 
position by I (for example, if t = (1,0,0,2,1) and j — 3, then £' = (1, 0, 1, 2, 1)). 
Equation (fT2"|l holds for every j, I, v, and every £ containing at j th place. 
A special case of equation Ijl2|l when v = (0, . . . , 0) is: 



(13) M e (B) ■ (Xji . ga + ... + \K. geK ) = Me (B) 

The right-hand side of this equation does not involve gj because gi ' 4 "' ^ = 1. 

By combining all equations (|12|) for all possible v and £ with norm alization equa- 
tions (like gik = 1) one obtains the main system of equations fsee lKovtun et, all 
1200.4 section 7). 

The important property of the main system of equations is given by the following 
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Theorem 3.1. Let Mt(B) be moments of a distribution generated by K-dimensio- 
nal LLS model. Let A 1 , . . . , \ K be any basis of the supporting subspace Q and let 
gj be conditional moments calculated with respect to this basis. 

Then X k and g v t give a solution of the main system of equation with coefficients 
M e (B). 

Moreover, for almost all (in the strict mathematical sense) distributions every 
solution of the main system of equations is a basis of the supporting subspace and 
conditional moments, calculated with respect to this basis. This implies that LLS 
model is almost surely identifiable; we discuss this fact in more detail in the next 
section. 

Note that equations l|12|) are linear with respect to variables gj. Thus, if one 
knows a basis of Q, it is sufficient to solve a linear system of equations to find the 
conditional moments. 

The supporting space Q can be found independently of the conditional moments 
by means of analysis of a moment matrix. Elements of a moment matrix are mo- 
ments Mi(B); rows of moment matrix are indexed by response patterns £ having 
exactly one non-zero component; columns of moment matrix are indexed by all 
possible response patterns. The index of an clement of the moment matrix lying in 
the intersection of row £' and column I" is £' + I" . However, addition of response 
patterns is possible only if in every position either the first or second summand has 
aO (for example, (1, 0, 0) + (0, 2, 1) = (1,2,2), but (1, 0, 0) + (2, 0, 1) is undefined); 
thus, some elements of the moment matrix are undefined. The reason for some 
components being undefined is that we do not have the possibility of performing 
a measurement on an individual multiple times independently, and since individu- 
als are heterogeneous (have different probabilities of outcomes of measurements), 
we do not have multiple realizations of independent identically distributed random 
variables. In the example below, such components are shown by question marks. 

Figure n§i ves an example of (a part of) a moment matrix for the case J = 3, 
L\ = L-2 = L% = 2. Columns in this matrix correspond to £ = (000), (100), (200), 
(010), (020), (001), (002), (110); other columns are not shown. 
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Figure 1 . Example of moment matrix 

For small J (as in the example) one has large fraction of undefined components 
in the moment matrix. For large J this fraction rapidly decreases. 
The main fact with respect to the moment matrix is: 
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Theorem 3.2. If a distribution is generated by K -dimensional LLS model with 
supporting subspace Q, then the moment matrix has a completion such that every 
one of its columns belongs to Q. 

We show below that a completion of a moment matrix is almost surely deter- 
mined by its available part. Thus, the main system of equations almost surely has 
a unique solution. 

Here we need to clarify what we mean when we say "uniqueness of solution." 
The supporting subspace Q and the identifiable properties of mixing distribution 
on this subspace are defined uniquely. One way to describe the subspace Q is to 
present its basis, and this can be done in infinitely many ways. The coordinate 
expression of the mixing distribution does depend on the choice of basis, and its 
characteristics (like moment s) also do depend on this choice. Such dependencies 
are governed by tensor laws l|Kovtun et all EoOfl section 6.4). 

The existence of infinitely many bases does not mean that we have infinitely 
many solutions; rather, we can describe a unique solution by infinitely many means. 
Various bases of Q provide different points of view on the same underlying picture. 
The ability to choose a basis may benefit the applied researcher, as it may help to 
present phenomenon under consideration more clearly. 

The phase of finding the supporting subspace in LLS analysis is tightly related 
to the principal component analysis of the mixing distribution. In fact, the sub- 
matrix of the moment matrix consisting of columns from 2 to \L\ + 1 is (modulo 
incompleteness) a shifted covariance matrix of the mixing distribution. Theorem 
13.21 corresponds to the fact that a multidimensional distribution is supported by 
m-dimensional linear manifold if and only if the rank of covariance matrix is m. 

Theorem 13.21 also provides a method for determining whether an LLS model 
exists for a par t icular dataset. One has to find the largest computational rank 
l|Forsvthe et all Il977|) of minors of the moment matrix containing no question 
marks; if it is sufficiently smaller than \L\, LLS model exists (the exact criterion of 
identifiability of LLS model is given by theorem 14. 4JI . 

Now we are ready to describe a method for estimation of parameters of LLS 
model. 

First, the supporting subspace is estimated from the moment matrix. The 
method of estimation is very similar to the one used in the principal component 
analysis, adopted to handle incompleteness of the moment matrix. The detailed 
description of the numerical procedure is a subject of another article. 

Second, a basis of the supporting subspace is chosen and conditional moments 
g\ are estimated by (approximately) solving the main system of equations. Note 
that moments can be found only for I having sufficiently many O's (to guarantee 
that there are sufficiently many equations (|12(l ). Moments for other £'s can be 
estimated as an average of directly estimable moments (for example, gu lt ...,ij) = 

7 (9(o,e 2 ,...,e.j) + \- 9(tu...,tj-u0)))- 

This is a nonparametric approach, i.e. inference of properties of a (mixing) dis- 
tribution is made without any additional assumptions about the structure of the 
distribution. If the nature of the applied problem justifies an assumption that the 
mixing distribution belongs to some parametric family (say, a mixing distribution 
is a Dirichlet distribution), the parameters of such distribution may be easily esti- 
mated by the moment method. 
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The mixed distribution in LLS analysis can be estimated in style of empiric 
distribution, by letting the estimate of the mixing distribution be concentrated in 
points §£ with weights Mg(B) (where bars mean estimates of corresponding values). 
It might be shown that this estimated distribution converges to the true one when 
both size of the sample and number of measurements tend to infinity, but the proof 
is outside the scope of the present paper. 

4. Identifiability of LLS models 

Identifiability of a parameter of a model means that the value of th e parameter 
is un iquely determined by the distribution of the observed variables ijGabrielsenl 
I197SIl In our case, the observed distribution is given by moments Mg(B). Thus, 
identifiability means that the values of other parameters (i.e., supporting subspace 
and conditional moments) are uniquely determined by the values Mg(B). 

We start with the discussion of identifiability of supporting subspace Q. The 
covariance matrix of the mixing distribution uniquely defines Q (Q is spanned by the 
vector of expectations of (3ji , which is the first column of the moment matrix, and 
eigenvectors of covariance matrix corresponding to non-zero eigenvalues). Thus, the 
supporting subspace is identifiable, if the covariance matrix (which is incomplete 
for the same reasons as the moment matrix) can be uniquely restored from the 
available moments. 

Lemma 4.1. Let C m be a class of covariance matrices of n- dimensional distribu- 
tions of rank m. Let, for arbitrary A G C m , A denote the matrix A with missing 
diagonal blocks of maximal size p. Let also assume that inequality 2m + 2p — 1 < n 
holds. 

Then, for arbitrary A, B 6 Q m , the equality A = B almost surely implies A = B. 

Outline of the proof. Figure El demonstrates how an incomplete covariance matrix 
can be restored. If the minor (cj)ij is nondegenerate, there exist a unique linear 
combination of columns c\ , . . . , c m that yields column b; let bj = J2ili c ) I0r au 3- 
Then, as the rank of the whole matrix is m, the element in the top left corner of 
the matrix must be X)i7i a *- Other elements denoted by question marks may be 
restored by applying similar procedure. 

The picture also illustrates the necessity of condition 2m + 2p — 1 < n. 

Thus, to complete a proof of the theorem, it is sufficient to show that all minors 
of a covariance matrix are almost surely nondegenerate. 

Every covariance matrix A of rank m is nonnegative definite and symmetric; 
thus, it can be represented in form A = T DO, where O is an orthogonal matrix 
and D is a diagonal matrix with exactly m nonzero elements. Further, almost every 
orthogonal matrix may be represented as O = (I + V)(L — V)" 1 , where J is the 
unit matrix and V is a skew-symmetric matrix (Cayley parametrization; seeEiS 
119751 IV.6). Thus, elements of covariance matrix can be represented as ratios of 
polynomials of n ( n ~ 1 ^ -\- m variables ( n elements of skew-symmetric matrix 
and m elements of matrix D). Consequently, all minors of order m are ratios of 
polynomials of these variables, and they are not identically (as it is possible to 
give an example of covariance matrix of rank m with all minors of order m being 
non-degenerate). But a set of O's of a polynomial has measure 0, q.e.d. □ 

Corollary 4.2. Let 9JI be a family of mixing distributions supported by K-dimen- 
sional linear subspaces and let v be a Borel measure on 371. Let (f> : pi i— > C M be a 
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Figure 2 . Restoration of a covariance matrix 



mapping of mixing distributions to their covariance matrices. Suppose the image 
measure of v under mapping <f> is absolutely continuous with respect to Lebesgue 
measure. Then for v-almost all mixing distributions their supporting subspace is 
identifiable. 

Remark 4.3. For a finite-dimensional Euclidean space, there is a "standard" mea- 
sure (Lebesgue measure), with respect to which "almost surely" statements are 
usually made. Unfortunately, there is no such "standard" measure in the infinite- 
dimensional case, and particulary there is no "standard" measure on the space 
of mixing distributions. However, one can introduce a notion of "nowhere de- 
generated" measure and show that every "nowhere degenerated" measure satisfies 
conditions of corollary 14.21 Thus, one can say that the supporting subspace of 
the mixing distribution is almost surely identifiable with respect to any "nowhere 
degenerated" measure on the space of mixing distributions. 

As lemma l4~Tl shows, the supporting subspace of the mixing distribution is (al- 
most surely) uniquely defined by moments Mi(B). This linear subspace can be 
described by infinitely many bases. When a basis of the supporting subspace is 
chosen, the main system of equations becomes a linear system with respect to con- 
ditional moments g v t . Moreover, it breaks apart into small subsystems that can be 
solved separately. For example, conditional expectations gik satisfy equations (jl3|l . 
If £ contains zeros at places ji, . . . ,j v , one has l(£) — Lj 1 + • • ■ + Lj equations 
for gn, . . . ,g£k, from which l(£) — p are independent. Thus, gik may be uniquely 
determined from the system, if fc < l(£) — p. Other conditional moments can be 
uniquely calculated from the system under similar conditions. 

Summarizing, we obtain 

Theorem 4.4. // the observed distribution is generated by a K -dimensional LLS 
model with K < J — max Lj + I . Then the LLS model is almost surely identi- 



fiable. 
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5. Consistency of LLS estimators 

The consistency of LLS estimators is almost a straightforward corollary of the 
well-known statistical fact that frequencies are consistent and efficient estimators 
of probabilities. 

The supporting subspace is estimated as a if-dimensional subspace closest to 
columns of the frequency matrix (more precisely, closest to the subspaces spanned 
by incomplete columns). This estimate continuously depends on the elements of the 
frequency matrix; thus, it converges to the true supporting subspace when elements 
of the frequency matrix converge to the true moments. 

Similarly, estimators for conditional moments gj are (approximate) solutions of 
a linear system with coefficients depending on frequencies. Again, these estimators 
continuously depend on frequencies and converge to the true conditional moments 
when frequencies converge to the true moments. 

The consistency of LLS estimators may be formulated as follows. Suppose we 
have a distribution generated by LLS model with mixing distribution supported 
by subspace Q and having conditional moments gj. Then estimators Q and gj, 
obtained by the procedure described above, converge to Q and respectively, 
when the size of a sample tends to infinity. Thus, we have: 

Theorem 5.1. If LLS model is identifiable, it is consistently estimable. 

6. Simulation studies 

We have developed a prototype of the algorithm for estimation of LLS param- 
eters and performed preliminary experiments with it. The implementation of the 
algorithm follows the ideas described above, though differring in detail needed to 
provide computational stability. 

The first experiments with the algorithm gave encouraging results. For illus- 
trative purposes, we choose 2-dimensional LLS model. As LLS uses homogeneous 
coordinates g — (51,52) and gi = 1 — <?i, this means that the mixing distribu- 
tion can be thought of as a distribution over interval g\ € [0, 1]. The results of 
four experiments are presented in figures |2^-d. All experiments were organized as 
follows. 

We randomly generated 2 basis vectors of the supporting subspace. Figure |3ji 
is a case with 300 binary questions (i.e., J = 300, Lj = 2 for all j, \L\ — 600); 
figures Ob-d are cases with 1000 binary questions (i.e., J = 1000, Lj = 2 for all 
j, \L\ = 2000). Then, we choose a mixing distribution. In figures|3K,b the mixing 
distribution is concentrated at two points, 0.1 and 0.4, in figure 0; the mixing 
distribution is uniform over subinterval [0.2,0.7], and in figure [2J1 it is uniform 
over two subintervals, [0,0.2] and [0.5,0.8]. Then we generated a sample of 10,000 
individuals by randomly choosing a point in the supporting subspace in accordance 
with the mixing distribution and generating responses with probabilities defined by 
the selected point (using equation 10). 

The set of responses was used as input to the algorithm. The algorithm estimated 
the supporting subspace, conditional moments of mixing distributions, and then 
estimated the mixing distribution itself. Histograms of restored distribution are 
shown in figures|3K _ d; a solid line in figuresOfc-d shows the true mixing distribution 
used to generate samples. 



12 



M.KOVTUN, I.AKUSHEVICH, K.G.MANTON, AND H.D.TOLLEY 



8 

400 
350 
300 
250 

200 : [ 
150 : [ 
100^ 
50 - 



a) 



g 

700 
600 
500 
400 
300 
200 
100 



b) 



I li . . V i:. i i i I i ■ i I ■ ■ ■ ■ I ■ ■ . '.1 ^ . I , , , , r ^,-, , I i i i i ^ — * i i W i i i i , n^i ... I i i i i I i i i 

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 gj V 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 gj 





0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 V 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 g t 



FIGURE 3. Restored mixing distribution (see text for further explanations). 



Experiments with different choices of supporting subspace and other randomly 
generated samples give similar results. 

Figures|3|;-d demonstrate a good quality of restoration of the mixing distribution 
under various conditions. Figures [3^-b show that the precision of restoration of 
the mixing distribution increases with the increase of the number of variables. 

It is interesting to compare the results of LLS model with the results of latent 
class model (LCM). In the cases|2K and|3jD, LCM would restore the same picture as 
LLS does, and thus, it can be an alternative to LLS model. However, in the cases 
13; and|3i, LCM may be used only as approximation, and it might be shown that 
LCM may involve approximately 1,000 (number of measurements) latent classes in 
these cases. 



7. Discussion 

In the present paper we described a new class of models for analyzing high- 
dimensional categorical data, which belongs to a family of latent structure models. 
We established conditions for identifiability of models and consistency of parameter 
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estimators. The essence of our approach is in the consideration of a space of inde- 
pendent distributions (which is a space of distributions being mixed in the case of 
latent structure analysis) as a linear space. This allows us, first, to formulate model 
assumptions in the language of linear algebra, and second, to reduce a method for 
estimating model parameters to a sequence of linear algebra problems. The very 
modest identifiability conditions ( theorem 14. 4(1 allow application of these models to 
a wide range of practical datasets. 

This linear-algebra approach allows us to clarify relationship between various 
branches of latent structure analysis. Consider, for example, relation between LLS 
models and latent class models (LCM). In geometric language, latent classes are 
points in the space of independent distributions. If an LCM with classes c±, . . . , c m 
exists for a particular dataset, then an LLS model also exists, and its supporting 
subspace is the linear subspace spanned by vectors ci, . . . , c m . Thus, dimensionality 
of LLS model never exceeds the number of classes in LCM. These numbers are equal 
if and only if LCM classes are points in general position (i.e., vectors c±, . . . , c m do 
not belong to a subspace of dimensionality smaller than m) . If LCM classes are not 
in general position, however, the dimensionality of LLS model may be significantly 
smaller. For example, it is possible to construct a mixing distribution such that (a) 
it is supported by a line (i.e., dimensionality of LLS model is 2); (b) there exists 
LCM with J (number of variables) classes; (c) there is no LCM with smaller number 
of classes. (A rigorous proof of the last fact will be given in another paper.) On 
the other hand, LLS can be used to evaluate applicability of LCM: if the mixing 
distribution in LLS model has pronounced modality, then an LCM is more likely 
to exist (with the number of classes equal to number of modes). 

Maybe, the most important question regarding any kind of model is its interpre- 
tation. The interpretation heavily depends on application domain, so we are able 
to give here only very general guidelines. If the application domain supports an 
assumption that individuals in a population may be described by points in a state 
space and probabilities of outcomes of measurements depend on individual coor- 
dinates in the state space, this state space can be recovered by LLS analysis, and 
coordinates of an individual in the state space can be estimated from the outcomes 
of measurements. However, the "physical meaning" of the state space, what does it 
mean "to be in a particular region of the state space" , etc. may be discussed only 
in terms of the application domain. 

There is one interesting property of LLS models, which can be characterized 
as a partial identifiability. Our method gives consistent estimates for supporting 
subspace of the mixing distribution and for conditional moments g v t of maximal 
order v satisfying |v| = v± + • — H vk < J. This is not, however, a limitation of the 
model; rather, it is limitation of the problem itself: if two mixing distributions are 
supported by the same subspace and have the same conditional moments of order 
\v\ < J, they will produce the same observed moments Mi; thus, these two mixing 
distributions are indistinguishable based on available data. On the other hand, 
two mixing distributions, which have the same moments of order up to J cannot 
be significantly different: it might be shown that distance between them converges 
to when J tends to infinity. This means that the recovered knowledge about 
mixing distribution can be made more and more precise by increasing number of 
measurements. This fact is well recognized in practice; for example, a mathematical 
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test based on multiple-choice questions would include several questions regarding, 
say, the addition of fractions to judge student performance on this topic. 

The above problem may also be considered from another side. As it was men- 
tioned in the end of section [21 the mixing distribution can be estimated in style of 
empirical distribution, by letting the estimate of the mixing distribution be con- 
centrated in points gn with weights Mi(B). One can ask how this estimate relates 
to the true mixing distribution? The answer is: the estimate of the mixing dis- 
tribution converges to the true one when both size of a sample and number of 
measurements tend to infinity. This fact may be considered as an analogue of the 
Glivenko-Cantelli theorem. The fact that estimate of the individual position in 
the state space becomes more and more precise with the increase of the number of 
measurements is an analogue of the Bernoulli's law of large numbers. The fact that 
one needs more and more measurements performed on each individual to increase 
precision of restoration of the mixing distribution does not diminish the usefulness 
of LLS analysis. It is well recognized that to achieve a required precision in statisti- 
cal inference one needs to perform sufficiently many measurements. The difference 
here is in that one needs not only to repeat the same measurement on different in- 
dividuals, but also to perform sufficiently many measurements on each individual. 
The proof of the above convergence and estimation of the rate of the convergence 
is subject of forthcoming papers. 
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